Noise and Channel Normalized Cepstral Features for Far-speech Recognition
نویسندگان
چکیده
The paper analyses suitable features for distorted speech recognition. The aim is to explore the application of command ASR system when the speech is recorded with far-distance microphones with a possible strong additive and convolutory noise. The paper analyses feasible contribution of basic spectral subtraction coupled with cepstral mean normalization in minimizing of the influence of present distortion in such far-talk channel. The results are compared with reference close-talk speech recognition system. The results show the improvement in WER for channels with low or medium SNR. Using the combination of these basic techniques WERR of 55.6% was obtained for medium distance channel and WERR of 22.5% for far distance channel.
منابع مشابه
Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملSmoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition
In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separa...
متن کاملDWT and LPC based feature extraction methods for isolated word recognition
In this article, new feature extraction methods, which utilize wavelet decomposition and reduced order linear predictive coding (LPC) coefficients, have been proposed for speech recognition. The coefficients have been derived from the speech frames decomposed using discrete wavelet transform. LPC coefficients derived from subband decomposition (abbreviated as WLPC) of speech frame provide bette...
متن کاملVoice-based Age and Gender Recognition using Training Generative Sparse Model
Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...
متن کاملRobust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique
In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in ad...
متن کامل